Goto

Collaborating Authors

 positive entry



6 APPENDIX

Neural Information Processing Systems

This appendix provides the mathematical proofs of the theoretical results and additional experiment results of our paper "Neuron Merging: Compensating for Pruned Neurons," accepted at 34th The overall derivation is the same as Section 3.1. If there exists a column with more than one strictly positive entry, then Eq. 13a does not For simple notation, subscript i is omitted. In Table 4, we present the test results of VGG-16 and ResNet-34 on ImageNet. ResNet-34, we prune all convolution layers in equal proportion. For example, '1-1' indicates the first pruned layer in the first residual block.


6 APPENDIX

Neural Information Processing Systems

This appendix provides the mathematical proofs of the theoretical results and additional experiment results of our paper "Neuron Merging: Compensating for Pruned Neurons," accepted at 34th The overall derivation is the same as Section 3.1. If there exists a column with more than one strictly positive entry, then Eq. 13a does not For simple notation, subscript i is omitted. In Table 4, we present the test results of VGG-16 and ResNet-34 on ImageNet. ResNet-34, we prune all convolution layers in equal proportion. For example, '1-1' indicates the first pruned layer in the first residual block.



A Discrete Perspective Towards the Construction of Sparse Probabilistic Boolean Networks

Fok, Christopher H., Wong, Chi-Wing, Ching, Wai-Ki

arXiv.org Artificial Intelligence

Boolean Network (BN) and its extension Probabilistic Boolean Network (PBN) are popular mathematical models for studying genetic regulatory networks. BNs and PBNs are also applied to model manufacturing systems, financial risk and healthcare service systems. In this paper, we propose a novel Greedy Entry Removal (GER) algorithm for constructing sparse PBNs. We derive theoretical upper bounds for both existing algorithms and the GER algorithm. Furthermore, we are the first to study the lower bound problem of the construction of sparse PBNs, and to derive a series of related theoretical results. In our numerical experiments based on both synthetic and practical data, GER gives the best performance among state-of-the-art sparse PBN construction algorithms and outputs sparsest possible decompositions on most of the transition probability matrices being tested.


Efficient estimation of AUC in a sliding window

Tatti, Nikolaj

arXiv.org Machine Learning

In many applications, monitoring area under the ROC curve (AUC) in a sliding window over a data stream is a natural way of detecting changes in the system. The drawback is that computing AUC in a sliding window is expensive, especially if the window size is large and the data flow is significant. In this paper we propose a scheme for maintaining an approximate AUC in a sliding window of length $k$. More specifically, we propose an algorithm that, given $\epsilon$, estimates AUC within $\epsilon / 2$, and can maintain this estimate in $O((\log k) / \epsilon)$ time, per update, as the window slides. This provides a speed-up over the exact computation of AUC, which requires $O(k)$ time, per update. The speed-up becomes more significant as the size of the window increases. Our estimate is based on grouping the data points together, and using these groups to calculate AUC. The grouping is designed carefully such that ($i$) the groups are small enough, so that the error stays small, ($ii$) the number of groups is small, so that enumerating them is not expensive, and ($iii$) the definition is flexible enough so that we can maintain the groups efficiently. Our experimental evaluation demonstrates that the average approximation error in practice is much smaller than the approximation guarantee $\epsilon / 2$, and that we can achieve significant speed-ups with only a modest sacrifice in accuracy.


PU Learning for Matrix Completion

Hsieh, Cho-Jui, Natarajan, Nagarajan, Dhillon, Inderjit S.

arXiv.org Machine Learning

In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only "likes" or "friendships" are observed. The problem of learning from only positive and unlabeled examples, called PU (positive-unlabeled) learning, has been studied in the context of binary classification. We consider the PU matrix completion problem, where an underlying real-valued matrix M is first quantized to generate one-bit observations and then a subset of positive entries is revealed. Under the assumption that M has bounded nuclear norm, we provide recovery guarantees for two different observation models: 1) M parameterizes a distribution that generates a binary matrix, 2) M is thresholded to obtain a binary matrix. For the first case, we propose a "shifted matrix completion" method that recovers M using only a subset of indices corresponding to ones, while for the second case, we propose a "biased matrix completion" method that recovers the (thresholded) binary matrix. Both methods yield strong error bounds --- if M is n by n, the Frobenius error is bounded as O(1/((1-rho)n), where 1-rho denotes the fraction of ones observed. This implies a sample complexity of O(n\log n) ones to achieve a small error, when M is dense and n is large. We extend our methods and guarantees to the inductive matrix completion problem, where rows and columns of M have associated features. We provide efficient and scalable optimization procedures for both the methods and demonstrate the effectiveness of the proposed methods for link prediction (on real-world networks consisting of over 2 million nodes and 90 million links) and semi-supervised clustering tasks.


Linear Algebra Approach to Separable Bayesian Networks

Asavathiratham, Chalee

arXiv.org Artificial Intelligence

Separable Bayesian Networks, or the Influence Model, are dynamic Bayesian Networks in which the conditional probability distribution can be separated into a function of only the marginal distribution of a node's neighbors, instead of the joint distributions. In terms of modeling, separable networks has rendered possible siginificant reduction in complexity, as the state space is only linear in the number of variables on the network, in contrast to a typical state space which is exponential. In this work, We describe the connection between an arbitrary Conditional Probability Table (CPT) and separable systems using linear algebra. We give an alternate proof on the equivalence of sufficiency and separability. We present a computational method for testing whether a given CPT is separable.